Overview

Dataset Statistics

Number of Variables 24
Number of Rows 100000
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 62.7 MB
Average Row Size in Memory 657.2 B
Variable Types
  • GeoGraphy: 1
  • Categorical: 14
  • Numerical: 8
  • DateTime: 1

Dataset Insights

sales is skewed Skewed
regular_price is skewed Skewed
ratio is skewed Skewed
rgb_g_main_col is skewed Skewed
article_id_1 has a high cardinality: 477 distinct values High Cardinality
article_id_1 has constant length 6 Constant Length
promo_media_ads has constant length 1 Constant Length
promo_store_event has constant length 1 Constant Length
article_id_2 has constant length 6 Constant Length
rgb_r_sec_col has constant length 3 Constant Length
rgb_g_sec_col has constant length 3 Constant Length
rgb_b_sec_col has constant length 3 Constant Length
label has constant length 1 Constant Length
rgb_b_main_col has 10000 (10.0%) zeros Zeros
  • 1
  • 2

Variables


country

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 7184540

Length

Mean 6.8454
Standard Deviation 0.3615
Median 7
Minimum 6
Maximum 7

Sample

1st row Germany
2nd row Germany
3rd row Germany
4th row Germany
5th row Germany

Letter

Count 684540
Lowercase Letter 584540
Space Separator 0
Uppercase Letter 100000
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Germany, Austria) take over 50.0%

article_id_1

categorical

Approximate Distinct Count 477
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Memory Size 7100000

Length

Mean 6
Standard Deviation 0
Median 6
Minimum 6
Maximum 6

Sample

1st row YN8639
2nd row YN8639
3rd row YN8639
4th row YN8639
5th row YN8639

Letter

Count 200000
Lowercase Letter 0
Space Separator 0
Uppercase Letter 200000
Dash Punctuation 0
Decimal Number 400000
  • article_id_1 has words of constant length

sales

numerical

Approximate Distinct Count 476
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1600000
Mean 56.7818
Minimum 1
Maximum 898
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • sales is skewed right (γ1 = 3.8588)

Quantile Statistics

Minimum 1
5-th Percentile 2
Q1 10
Median 26
Q3 64
95-th Percentile 216
Maximum 898
Range 897
IQR 54

Descriptive Statistics

Mean 56.7818
Standard Deviation 87.9347
Variance 7732.5191
Sum 5.6782e+06
Skewness 3.8588
Kurtosis 20.6563
Coefficient of Variation 1.5486
  • sales is not normally distributed (p-value 2.1632548404864911e-19)
  • sales has 9540 outliers

regular_price

numerical

Approximate Distinct Count 123
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1600000
Mean 52.3912
Minimum 3.95
Maximum 197.95
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • regular_price is skewed right (γ1 = 0.9037)

Quantile Statistics

Minimum 3.95
5-th Percentile 6.95
Q1 25.95
Median 40.95
Q3 79.95
95-th Percentile 120.95
Maximum 197.95
Range 194
IQR 54

Descriptive Statistics

Mean 52.3912
Standard Deviation 35.2721
Variance 1244.123
Sum 5.2391e+06
Skewness 0.9037
Kurtosis 0.3223
Coefficient of Variation 0.6732
  • regular_price is not normally distributed (p-value 2.5600309989423665e-07)
  • regular_price has 280 outliers

current_price

numerical

Approximate Distinct Count 141
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1600000
Mean 28.2908
Minimum 1.95
Maximum 195.95
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • current_price is skewed right (γ1 = 1.5475)

Quantile Statistics

Minimum 1.95
5-th Percentile 3.95
Q1 11.95
Median 20.95
Q3 37.95
95-th Percentile 74.95
Maximum 195.95
Range 194
IQR 26

Descriptive Statistics

Mean 28.2908
Standard Deviation 22.5783
Variance 509.7816
Sum 2.8291e+06
Skewness 1.5475
Kurtosis 2.9166
Coefficient of Variation 0.7981
  • current_price is not normally distributed (p-value 3.3634153535616836e-06)
  • current_price has 4480 outliers

ratio

numerical

Approximate Distinct Count 2722
Approximate Unique (%) 2.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1600000
Mean 0.5456
Minimum 0.2965
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • ratio is skewed right (γ1 = 0.3978)

Quantile Statistics

Minimum 0.2965
5-th Percentile 0.3028
Q1 0.3548
Median 0.525
Q3 0.6992
95-th Percentile 0.8887
Maximum 1
Range 0.7035
IQR 0.3444

Descriptive Statistics

Mean 0.5456
Standard Deviation 0.1944
Variance 0.03778
Sum 54564.5863
Skewness 0.3978
Kurtosis -0.9114
Coefficient of Variation 0.3562
  • ratio is not normally distributed (p-value 1.9785368819728973e-19)

retailweek

datetime

Distinct Count 123.1156
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 1600000
Minimum 2014-12-28 00:00:00
Maximum 2017-04-30 00:00:00

promo_media_ads

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6600000
  • The largest value (0) is over 15.16 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 100000
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 15.16 times larger than the second largest value (1)
  • promo_media_ads has words of constant length

promo_store_event

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6600000
  • The largest value (0) is over 203.08 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 100000
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 203.08 times larger than the second largest value (1)
  • promo_store_event has words of constant length

customer_id

numerical

Approximate Distinct Count 4549
Approximate Unique (%) 4.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1600000
Mean 2721.7265
Minimum 1
Maximum 5999
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • customer_id is skewed right (γ1 = 0.2438)

Quantile Statistics

Minimum 1
5-th Percentile 203
Q1 1017
Median 2091
Q3 4570.25
95-th Percentile 5721.05
Maximum 5999
Range 5998
IQR 3553.25

Descriptive Statistics

Mean 2721.7265
Standard Deviation 1908.0855
Variance 3.6408e+06
Sum 2.7217e+08
Skewness 0.2438
Kurtosis -1.4331
Coefficient of Variation 0.7011
  • customer_id is not normally distributed (p-value 1.614823539128391e-07)

article_id_2

categorical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 7100000

Length

Mean 6
Standard Deviation 0
Median 6
Minimum 6
Maximum 6

Sample

1st row OC6355
2nd row AP5568
3rd row CB8861
4th row LI3529
5th row GG8661

Letter

Count 200000
Lowercase Letter 0
Space Separator 0
Uppercase Letter 200000
Dash Punctuation 0
Decimal Number 400000
  • article_id_2 has words of constant length

productgroup

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 7370000
  • The largest value (SHOES) is over 3.0 times larger than the second largest value (HARDWARE ACCESSORIES)

Length

Mean 8.7
Standard Deviation 5.917
Median 5
Minimum 5
Maximum 20

Sample

1st row SHOES
2nd row SHORTS
3rd row HARDWARE ACCESSORI...
4th row SHOES
5th row SHOES

Letter

Count 850000
Lowercase Letter 0
Space Separator 20000
Uppercase Letter 850000
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (SHOES, HARDWARE ACCESSORIES) take over 50.0%
  • The largest value (shoes) is over 3.0 times larger than the second largest value (accessories)

category

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 7420000

Length

Mean 9.2
Standard Deviation 3.8936
Median 8
Minimum 4
Maximum 16

Sample

1st row TRAINING
2nd row TRAINING
3rd row GOLF
4th row RUNNING
5th row RELAX CASUAL

Letter

Count 890000
Lowercase Letter 0
Space Separator 30000
Uppercase Letter 890000
Dash Punctuation 0
Decimal Number 0

cost_article_2

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1600000
Mean 6.517
Minimum 1.29
Maximum 13.29
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • cost_article_2 is skewed right (γ1 = 0.0994)

Quantile Statistics

Minimum 1.29
5-th Percentile 1.29
Q1 2.29
Median 6.95
Q3 9.6
95-th Percentile 13.29
Maximum 13.29
Range 12
IQR 7.31

Descriptive Statistics

Mean 6.517
Standard Deviation 3.9147
Variance 15.3251
Sum 651700
Skewness 0.09935
Kurtosis -1.2873
Coefficient of Variation 0.6007
  • cost_article_2 is not normally distributed (p-value 0.00045385139444877845)

style

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 7050000
  • The largest value (regular) is over 1.67 times larger than the second largest value (wide)

Length

Mean 5.5
Standard Deviation 1.5
Median 5.5
Minimum 4
Maximum 7

Sample

1st row slim
2nd row regular
3rd row regular
4th row regular
5th row regular

Letter

Count 550000
Lowercase Letter 550000
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (regular, wide) take over 50.0%
  • The largest value (regular) is over 1.67 times larger than the second largest value (wide)

sizes

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 8320000
  • The largest value (xxs,xs,s,m,l,xl,xxl) is over 9.0 times larger than the second largest value (xs,s,m,l,xl)

Length

Mean 18.2
Standard Deviation 2.4
Median 19
Minimum 11
Maximum 19

Sample

1st row xxs,xs,s,m,l,xl,xx...
2nd row xxs,xs,s,m,l,xl,xx...
3rd row xxs,xs,s,m,l,xl,xx...
4th row xxs,xs,s,m,l,xl,xx...
5th row xxs,xs,s,m,l,xl,xx...

Letter

Count 1240000
Lowercase Letter 1240000
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (xxs,xs,s,m,l,xl,xxl, xs,s,m,l,xl) take over 50.0%
  • The largest value (xxsxslxlxxl) is over 9.0 times larger than the second largest value (xslxl)

gender

categorical

Approximate Distinct Count 4
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6980000
  • The largest value (women) is over 7.0 times larger than the second largest value (kids)

Length

Mean 4.8
Standard Deviation 0.7483
Median 5
Minimum 3
Maximum 6

Sample

1st row women
2nd row women
3rd row women
4th row kids
5th row women

Letter

Count 480000
Lowercase Letter 480000
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (women, kids) take over 50.0%
  • The largest value (women) is over 7.0 times larger than the second largest value (kids)

rgb_r_main_col

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6790000

Length

Mean 2.9
Standard Deviation 0.3
Median 3
Minimum 2
Maximum 3

Sample

1st row 205
2nd row 188
3rd row 205
4th row 205
5th row 138

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 290000

rgb_g_main_col

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1600000
Mean 139.6
Minimum 26
Maximum 238
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • rgb_g_main_col is skewed left (γ1 = -0.4105)

Quantile Statistics

Minimum 26
5-th Percentile 26
Q1 104
Median 144
Q3 181
95-th Percentile 238
Maximum 238
Range 212
IQR 77

Descriptive Statistics

Mean 139.6
Standard Deviation 63.6418
Variance 4050.2805
Sum 1.396e+07
Skewness -0.4105
Kurtosis -0.7287
Coefficient of Variation 0.4559
  • rgb_g_main_col is not normally distributed (p-value 3.4554452465621546e-08)

rgb_b_main_col

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1600000
Mean 133.5
Minimum 0
Maximum 250
Zeros 10000
Zeros (%) 10.0%
Negatives 0
Negatives (%) 0.0%
  • rgb_b_main_col is skewed left (γ1 = -0.2331)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 57
Median 143
Q3 205
95-th Percentile 250
Maximum 250
Range 250
IQR 148

Descriptive Statistics

Mean 133.5
Standard Deviation 81.1487
Variance 6585.1159
Sum 1.335e+07
Skewness -0.2331
Kurtosis -1.213
Coefficient of Variation 0.6079
  • rgb_b_main_col is not normally distributed (p-value 0.00045385139444877845)

rgb_r_sec_col

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6800000

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 255
2nd row 255
3rd row 255
4th row 164
5th row 164

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 300000
  • The top 2 categories (205, 164) take over 50.0%
  • rgb_r_sec_col has words of constant length

rgb_g_sec_col

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6800000

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 187
2nd row 187
3rd row 187
4th row 211
5th row 211

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 300000
  • The top 2 categories (155, 187) take over 50.0%
  • rgb_g_sec_col has words of constant length

rgb_b_sec_col

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6800000

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row 255
2nd row 255
3rd row 255
4th row 238
5th row 238

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 300000
  • The top 2 categories (155, 238) take over 50.0%
  • rgb_b_sec_col has words of constant length

label

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6600000
  • The largest value (0) is over 6.18 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 100000
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 6.18 times larger than the second largest value (1)
  • label has words of constant length

Interactions

Correlations

Missing Values